Skip to content

perf(optimizer): generic in-place mutable rewrite to obsolete per-rule fast-paths#22728

Draft
zhuqi-lucas wants to merge 4 commits into
apache:mainfrom
zhuqi-lucas:perf/in-place-mutable-rewrite
Draft

perf(optimizer): generic in-place mutable rewrite to obsolete per-rule fast-paths#22728
zhuqi-lucas wants to merge 4 commits into
apache:mainfrom
zhuqi-lucas:perf/in-place-mutable-rewrite

Conversation

@zhuqi-lucas
Copy link
Copy Markdown
Contributor

@zhuqi-lucas zhuqi-lucas commented Jun 3, 2026

Which issue does this PR close?

Closes #22616.

Rationale for this change

#22298 added an in-place Arc::make_mut traversal for the optimizer driver, but only used it when the plan had no subqueries — plans with subqueries fell back to the ownership-based path that destructures and rebuilds each LogicalPlan variant at every recursive layer. Two ApplyOrder::None rules (EliminateCrossJoin, CommonSubexprEliminate) also never got the in-place treatment.

Profiling sql_planner (pre-#22298) showed a non-trivial share of CPU in those ownership-based tree walks. Most of that has already been collected by #22298 — this PR closes the remaining two gaps.

Scope

This PR converts:

  • TopDown / BottomUp rules (~17 of them, all driven by rewrite_plan_in_place) — subquery-bearing plans now go through the in-place path too, no more fallback to owned recursion.
  • EliminateCrossJoin and CommonSubexprEliminate — the two ApplyOrder::None rules with the simple map_uncorrelated_subqueries + map_children pattern.

Not converted (left for future work — each has its own non-uniform recursion shape that doesn't fit the generic helper):

  • OptimizeProjections — required-columns context plumbed top-down.
  • RewriteSetComparison — restructures expressions, introduces aliases.
  • UnionsToFilter — shape-dependent rewrites on Union nodes.

Where the gain actually comes from

Of the two ApplyOrder::None rules converted:

  • CommonSubexprEliminate is where most of the value lands. It has no plan-level fast-path: every plan walks from root to leaf through rewrite_children, including on plans where the rule's own targeted optimization has nothing to do at any node (Join, Repartition, Union, TableScan, Limit, ...). The in-place conversion saves the per-layer destructure/rebuild on every such walk.

  • EliminateCrossJoin is marginal in practice: the plan_has_joins fast-path landed in perf(optimizer): EliminateCrossJoin fast-path for join-free plans #22612 short-circuits the rule entirely on join-free plans, and on plans with joins the rule's main flattening logic uses Arc::unwrap_or_clone directly rather than rewrite_children. rewrite_children only fires on the wrapping non-join layers (e.g. an outer Projection above a join chain) — small but non-zero.

The third source of gain is the TopDown/BottomUp path: previously, any plan with even one subquery expression fell back to owned recursion for all ~17 rules in the pipeline. Now those plans stay on the in-place path.

What changes are included in this PR?

Two in-place helpers so all converted rules share the Arc::make_mut path:

  • map_children_and_subqueries_mut — extends map_children_mut to also descend into subquery plans referenced from expressions (Expr::ScalarSubquery, InSubquery, Exists, SetComparison).
  • rewrite_children_in_place — in-place equivalent of the map_uncorrelated_subqueries + map_children pattern. EliminateCrossJoin and CommonSubexprEliminate convert to it.

When a rule returns Transformed::no, the parent LogicalPlan variant is no longer destructured-and-rebuilt at every recursive layer.

Subquery descent is gated by a one-shot whole-plan plan_has_subqueries check per rule application: subquery-free plans (the common case for join-heavy workloads) fall through to plain map_children_mut and pay zero subquery overhead — same shape as the #22298 fast path. Plans with subqueries get the new unified in-place recursion. The ApplyOrder::None path additionally has a per-node fast-path inside map_children_and_subqueries_mut to skip the mem::take + map_subqueries roundtrip when a node carries no subquery expressions.

Are these changes tested?

  • 5 new unit tests on the helpers (subquery descent, change reporting, rule recursion bridge).
  • 716 optimizer unit tests pass.
  • SLT sweep — subquery / cse / joins / predicates / aggregate / tpch (16 files) — all pass.

Are there any user-facing changes?

No. Pure perf, no new APIs / config knobs. LogicalPlan::has_subquery_expressions is now pub so the optimizer can call it without duplicating the walker.

@github-actions github-actions Bot added the optimizer Optimizer rules label Jun 3, 2026
Generalizes the in-place `Arc::make_mut` traversal adriangb landed
in apache#22298 so all logical-optimizer rules — not just the
no-subqueries fast-path — get the cheaper recursion. Closes apache#22616.

Two pieces:

1. `map_children_and_subqueries_mut` — extends
   `map_children_mut` with a subquery-plan descent. Direct children
   stay on the in-place path; subqueries borrow the plan briefly via
   `mem::take` to reach the ownership-based
   `LogicalPlan::map_subqueries`, whose per-node
   `has_subquery_expressions` fast-path keeps subquery-free nodes
   allocation-free.

   With this in place, `rewrite_plan_in_place` no longer needs the
   `plan_has_subqueries` gate the driver was using — plans with
   subqueries can now share the in-place recursion. Removes
   `plan_has_subqueries`, the `TreeNodeRewriter`-based
   `Rewriter` shim, and the `has_subqueries ? owned : in_place`
   branch in `Optimizer::optimize_internal`.

2. `rewrite_children_in_place` — in-place equivalent of the
   `map_uncorrelated_subqueries` + `map_children` pattern that
   rules with `apply_order = None` use to drive their own
   recursion. `EliminateCrossJoin` and `CommonSubexprEliminate`
   are the two such rules in-tree; both convert.

The net effect: when a rule returns `Transformed::no` for a
child, the parent `LogicalPlan` variant is no longer
destructured-and-rebuilt at every recursive layer — the child's
`Arc` is reused via `Arc::make_mut`. Profile-guided: sql_planner
profiling on `logical_plan_tpch_all` / `optimizer_tpch_all`
showed ~17% of active CPU in `Vec/Box/Arc TreeNodeContainer`
walking; this path is what generates that load.

Net diff: -28 lines (the `Rewriter` shim + `plan_has_subqueries`
together outweigh the two new helpers).

711 optimizer unit tests still pass. SLT broad sweep — subquery /
cse / joins / predicates / aggregate / tpch (16 files) — all
pass. `cargo clippy -p datafusion-optimizer --all-targets --
-D warnings` clean.
The new `map_children_and_subqueries_mut` /
`rewrite_children_in_place` helpers were only covered through the
existing 711 optimizer tests + SLT sweep — fine for behaviour but
not great for pinning each helper's contract on its own.

Five targeted tests:

- `map_children_and_subqueries_mut_no_children` — no children +
  no subqueries returns `Ok(false)` and never invokes `f`.
- `map_children_and_subqueries_mut_walks_direct_children` —
  `Filter`'s `TableScan` input is visited exactly once.
- `map_children_and_subqueries_mut_descends_into_subquery` —
  pins the subquery descent that `map_children_mut` alone is
  missing: `Filter(IN (SELECT ...))` visits the outer `TableScan`
  *and* the subquery plan inside the predicate.
- `map_children_and_subqueries_mut_reports_changes` — closure
  returning `true` propagates as `changed == true`.
- `rewrite_children_in_place_drives_rule_recursion` — bridges
  `&mut LogicalPlan` callback to ownership-based
  `OptimizerRule::rewrite` API and visits each child exactly
  once with a recording rule.
@zhuqi-lucas zhuqi-lucas force-pushed the perf/in-place-mutable-rewrite branch from 3ae2c6d to cc54f98 Compare June 3, 2026 06:18
@zhuqi-lucas
Copy link
Copy Markdown
Contributor Author

run benchmark sql_planner

@adriangbot
Copy link
Copy Markdown

🤖 Criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4609661850-412-qlffq 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing perf/in-place-mutable-rewrite (cc54f98) to 7bf54db (merge-base) diff
BENCH_NAME=sql_planner
BENCH_COMMAND=cargo bench --features=parquet --bench sql_planner
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                                 main                                   perf_in-place-mutable-rewrite
-----                                                 ----                                   -----------------------------
logical_aggregate_with_join                           1.00    454.5±1.67µs        ? ?/sec    1.00    456.3±0.99µs        ? ?/sec
logical_correlated_subquery_exists                    1.01    287.3±1.08µs        ? ?/sec    1.00    285.8±0.53µs        ? ?/sec
logical_correlated_subquery_in                        1.00    287.2±0.76µs        ? ?/sec    1.00    287.5±1.25µs        ? ?/sec
logical_distinct_many_columns                         1.00    576.7±1.36µs        ? ?/sec    1.00    576.2±2.44µs        ? ?/sec
logical_join_4_with_agg_and_filter                    1.04    259.6±1.54µs        ? ?/sec    1.00    250.7±1.06µs        ? ?/sec
logical_join_8_with_agg_sort_limit                    1.02    437.8±1.32µs        ? ?/sec    1.00    428.4±1.84µs        ? ?/sec
logical_join_chain_16                                 1.00    675.1±2.54µs        ? ?/sec    1.01    680.3±2.81µs        ? ?/sec
logical_join_chain_4                                  1.00    122.9±0.34µs        ? ?/sec    1.00    122.3±0.49µs        ? ?/sec
logical_join_chain_8                                  1.01    252.7±0.69µs        ? ?/sec    1.00    251.3±0.78µs        ? ?/sec
logical_multiple_subqueries                           1.00    518.9±2.00µs        ? ?/sec    1.00    521.1±1.25µs        ? ?/sec
logical_nested_cte_4_levels                           1.00    278.6±1.05µs        ? ?/sec    1.01    280.8±1.24µs        ? ?/sec
logical_plan_struct_join_agg_sort                     1.00    182.1±0.68µs        ? ?/sec    1.01    183.2±1.14µs        ? ?/sec
logical_plan_tpcds_all                                1.01     94.2±0.21ms        ? ?/sec    1.00     93.0±0.17ms        ? ?/sec
logical_plan_tpch_all                                 1.05      6.8±0.02ms        ? ?/sec    1.00      6.5±0.02ms        ? ?/sec
logical_scalar_subquery                               1.01    312.7±1.12µs        ? ?/sec    1.00    309.5±0.78µs        ? ?/sec
logical_select_all_from_1000                          1.01    105.2±0.21ms        ? ?/sec    1.00    104.4±0.83ms        ? ?/sec
logical_select_one_from_700                           1.00    324.2±1.68µs        ? ?/sec    1.01    326.1±1.68µs        ? ?/sec
logical_trivial_join_high_numbered_columns            1.00    284.2±1.16µs        ? ?/sec    1.01    285.8±0.86µs        ? ?/sec
logical_trivial_join_low_numbered_columns             1.00    271.7±1.13µs        ? ?/sec    1.01    273.7±0.89µs        ? ?/sec
logical_union_4_branches                              1.00    427.6±1.78µs        ? ?/sec    1.01    430.4±2.90µs        ? ?/sec
logical_union_8_branches                              1.00    812.2±3.01µs        ? ?/sec    1.01    822.4±7.15µs        ? ?/sec
logical_wide_aggregate_100_exprs                      1.00      4.3±0.01ms        ? ?/sec    1.00      4.3±0.01ms        ? ?/sec
logical_wide_case_50_exprs                            1.00      2.4±0.00ms        ? ?/sec    1.00      2.4±0.00ms        ? ?/sec
logical_wide_filter_200_predicates                    1.00   1309.4±6.37µs        ? ?/sec    1.01   1326.6±7.43µs        ? ?/sec
logical_wide_filter_50_predicates                     1.00    393.7±2.04µs        ? ?/sec    1.01    396.1±2.42µs        ? ?/sec
optimizer_correlated_exists                           1.00    248.6±1.47µs        ? ?/sec    1.01    251.2±0.84µs        ? ?/sec
optimizer_join_4_with_agg_filter                      1.00    437.0±2.43µs        ? ?/sec    1.06    465.1±1.30µs        ? ?/sec
optimizer_join_chain_4                                1.00    175.4±0.39µs        ? ?/sec    1.12    196.0±0.67µs        ? ?/sec
optimizer_join_chain_8                                1.00    553.4±1.18µs        ? ?/sec    1.08    596.8±1.58µs        ? ?/sec
optimizer_select_all_from_1000                        1.00      4.7±0.01ms        ? ?/sec    1.00      4.7±0.02ms        ? ?/sec
optimizer_select_one_from_700                         1.00    249.4±0.58µs        ? ?/sec    1.03    257.9±0.59µs        ? ?/sec
optimizer_tpcds_all                                   1.00    292.7±0.59ms        ? ?/sec    1.03    301.0±1.49ms        ? ?/sec
optimizer_tpch_all                                    1.00     15.4±0.06ms        ? ?/sec    1.03     15.8±0.03ms        ? ?/sec
optimizer_wide_aggregate_100                          1.00      2.1±0.01ms        ? ?/sec    1.04      2.2±0.00ms        ? ?/sec
optimizer_wide_filter_200                             1.00      3.5±0.02ms        ? ?/sec    1.05      3.7±0.00ms        ? ?/sec
physical_intersection                                 1.00    577.4±3.83µs        ? ?/sec    1.03    596.1±2.29µs        ? ?/sec
physical_join_consider_sort                           1.00   1000.1±3.42µs        ? ?/sec    1.04   1035.6±7.04µs        ? ?/sec
physical_join_distinct                                1.00    264.6±1.09µs        ? ?/sec    1.01    267.3±0.99µs        ? ?/sec
physical_many_self_joins                              1.00      7.6±0.02ms        ? ?/sec    1.01      7.7±0.02ms        ? ?/sec
physical_plan_clickbench_all                          1.00    121.4±0.47ms        ? ?/sec    1.03    124.5±0.46ms        ? ?/sec
physical_plan_clickbench_q1                           1.00  1325.8±16.69µs        ? ?/sec    1.00   1330.4±6.30µs        ? ?/sec
physical_plan_clickbench_q10                          1.00   1919.1±9.28µs        ? ?/sec    1.02   1963.8±6.28µs        ? ?/sec
physical_plan_clickbench_q11                          1.00      2.1±0.01ms        ? ?/sec    1.03      2.1±0.01ms        ? ?/sec
physical_plan_clickbench_q12                          1.00      2.2±0.01ms        ? ?/sec    1.03      2.2±0.01ms        ? ?/sec
physical_plan_clickbench_q13                          1.00  1935.4±11.96µs        ? ?/sec    1.04      2.0±0.01ms        ? ?/sec
physical_plan_clickbench_q14                          1.00      2.1±0.02ms        ? ?/sec    1.01      2.1±0.01ms        ? ?/sec
physical_plan_clickbench_q15                          1.00      2.0±0.02ms        ? ?/sec    1.01      2.0±0.01ms        ? ?/sec
physical_plan_clickbench_q16                          1.00   1695.0±8.17µs        ? ?/sec    1.02   1720.6±6.19µs        ? ?/sec
physical_plan_clickbench_q17                          1.00   1741.1±6.92µs        ? ?/sec    1.01   1766.8±8.05µs        ? ?/sec
physical_plan_clickbench_q18                          1.00   1601.9±6.85µs        ? ?/sec    1.01   1620.6±7.75µs        ? ?/sec
physical_plan_clickbench_q19                          1.00   1952.9±7.50µs        ? ?/sec    1.01   1970.8±8.14µs        ? ?/sec
physical_plan_clickbench_q2                           1.00   1679.8±9.39µs        ? ?/sec    1.02   1716.1±8.03µs        ? ?/sec
physical_plan_clickbench_q20                          1.00   1474.2±5.41µs        ? ?/sec    1.02   1504.7±6.97µs        ? ?/sec
physical_plan_clickbench_q21                          1.00   1691.9±7.42µs        ? ?/sec    1.00   1697.7±6.26µs        ? ?/sec
physical_plan_clickbench_q22                          1.00      2.0±0.01ms        ? ?/sec    1.01      2.1±0.01ms        ? ?/sec
physical_plan_clickbench_q23                          1.00      2.2±0.01ms        ? ?/sec    1.00      2.2±0.01ms        ? ?/sec
physical_plan_clickbench_q24                          1.00      6.6±0.03ms        ? ?/sec    1.01      6.6±0.03ms        ? ?/sec
physical_plan_clickbench_q25                          1.00   1815.6±6.42µs        ? ?/sec    1.01   1831.8±5.69µs        ? ?/sec
physical_plan_clickbench_q26                          1.00   1659.1±6.87µs        ? ?/sec    1.00   1665.0±6.09µs        ? ?/sec
physical_plan_clickbench_q27                          1.00   1831.6±5.62µs        ? ?/sec    1.01   1845.5±5.92µs        ? ?/sec
physical_plan_clickbench_q28                          1.00      2.3±0.01ms        ? ?/sec    1.01      2.3±0.03ms        ? ?/sec
physical_plan_clickbench_q29                          1.00      2.4±0.01ms        ? ?/sec    1.01      2.4±0.02ms        ? ?/sec
physical_plan_clickbench_q3                           1.00   1563.8±7.44µs        ? ?/sec    1.02   1596.9±7.49µs        ? ?/sec
physical_plan_clickbench_q30                          1.00     15.0±0.07ms        ? ?/sec    1.01     15.2±0.08ms        ? ?/sec
physical_plan_clickbench_q31                          1.00      2.3±0.01ms        ? ?/sec    1.00      2.3±0.01ms        ? ?/sec
physical_plan_clickbench_q32                          1.00      2.3±0.01ms        ? ?/sec    1.00      2.3±0.01ms        ? ?/sec
physical_plan_clickbench_q33                          1.01  1933.5±26.99µs        ? ?/sec    1.00   1922.0±5.92µs        ? ?/sec
physical_plan_clickbench_q34                          1.00  1699.9±11.63µs        ? ?/sec    1.00   1695.6±5.56µs        ? ?/sec
physical_plan_clickbench_q35                          1.00   1766.1±7.70µs        ? ?/sec    1.00   1761.8±7.25µs        ? ?/sec
physical_plan_clickbench_q36                          1.01      2.1±0.01ms        ? ?/sec    1.00      2.1±0.01ms        ? ?/sec
physical_plan_clickbench_q37                          1.00      2.4±0.01ms        ? ?/sec    1.00      2.4±0.01ms        ? ?/sec
physical_plan_clickbench_q38                          1.00      2.4±0.01ms        ? ?/sec    1.02      2.4±0.01ms        ? ?/sec
physical_plan_clickbench_q39                          1.00      2.4±0.01ms        ? ?/sec    1.02      2.5±0.01ms        ? ?/sec
physical_plan_clickbench_q4                           1.00   1388.0±4.76µs        ? ?/sec    1.02   1415.6±5.68µs        ? ?/sec
physical_plan_clickbench_q40                          1.00      3.2±0.01ms        ? ?/sec    1.02      3.2±0.01ms        ? ?/sec
physical_plan_clickbench_q41                          1.00      2.7±0.01ms        ? ?/sec    1.03      2.8±0.02ms        ? ?/sec
physical_plan_clickbench_q42                          1.00      2.9±0.02ms        ? ?/sec    1.01      3.0±0.02ms        ? ?/sec
physical_plan_clickbench_q43                          1.01      3.1±0.03ms        ? ?/sec    1.00      3.0±0.01ms        ? ?/sec
physical_plan_clickbench_q44                          1.01   1498.1±8.32µs        ? ?/sec    1.00   1490.7±7.01µs        ? ?/sec
physical_plan_clickbench_q45                          1.00   1505.2±9.91µs        ? ?/sec    1.00   1497.7±6.54µs        ? ?/sec
physical_plan_clickbench_q46                          1.00   1805.8±6.86µs        ? ?/sec    1.00   1800.9±6.31µs        ? ?/sec
physical_plan_clickbench_q47                          1.00      2.5±0.01ms        ? ?/sec    1.04      2.6±0.01ms        ? ?/sec
physical_plan_clickbench_q48                          1.00      2.7±0.01ms        ? ?/sec    1.02      2.7±0.02ms        ? ?/sec
physical_plan_clickbench_q49                          1.00      2.7±0.01ms        ? ?/sec    1.04      2.8±0.02ms        ? ?/sec
physical_plan_clickbench_q5                           1.00   1530.5±6.25µs        ? ?/sec    1.01   1545.3±6.47µs        ? ?/sec
physical_plan_clickbench_q50                          1.00      2.6±0.01ms        ? ?/sec    1.03      2.6±0.02ms        ? ?/sec
physical_plan_clickbench_q51                          1.00   1862.9±7.26µs        ? ?/sec    1.03   1914.6±7.05µs        ? ?/sec
physical_plan_clickbench_q6                           1.00   1518.3±8.18µs        ? ?/sec    1.02   1552.1±7.46µs        ? ?/sec
physical_plan_clickbench_q7                           1.00   1348.3±6.11µs        ? ?/sec    1.02   1379.1±6.23µs        ? ?/sec
physical_plan_clickbench_q8                           1.00  1836.9±10.78µs        ? ?/sec    1.02   1877.5±6.57µs        ? ?/sec
physical_plan_clickbench_q9                           1.00   1802.4±8.24µs        ? ?/sec    1.03   1852.2±7.96µs        ? ?/sec
physical_plan_struct_join_agg_sort                    1.00   1274.2±4.07µs        ? ?/sec    1.01   1283.1±2.54µs        ? ?/sec
physical_plan_tpcds_all                               1.00    723.4±1.10ms        ? ?/sec    1.01    733.2±1.89ms        ? ?/sec
physical_plan_tpch_all                                1.00     45.2±0.06ms        ? ?/sec    1.01     45.8±0.11ms        ? ?/sec
physical_plan_tpch_q1                                 1.00   1469.4±3.30µs        ? ?/sec    1.01   1478.7±3.42µs        ? ?/sec
physical_plan_tpch_q10                                1.00      2.9±0.00ms        ? ?/sec    1.02      3.0±0.01ms        ? ?/sec
physical_plan_tpch_q11                                1.04      2.2±0.00ms        ? ?/sec    1.00      2.1±0.00ms        ? ?/sec
physical_plan_tpch_q12                                1.00   1201.8±2.95µs        ? ?/sec    1.01   1218.9±3.12µs        ? ?/sec
physical_plan_tpch_q13                                1.00    999.4±2.69µs        ? ?/sec    1.02   1016.6±3.54µs        ? ?/sec
physical_plan_tpch_q14                                1.00   1382.3±3.22µs        ? ?/sec    1.00   1375.7±2.64µs        ? ?/sec
physical_plan_tpch_q16                                1.00   1522.6±1.97µs        ? ?/sec    1.02  1550.5±11.67µs        ? ?/sec
physical_plan_tpch_q17                                1.01  1712.1±13.77µs        ? ?/sec    1.00   1701.0±9.85µs        ? ?/sec
physical_plan_tpch_q18                                1.00  1997.4±13.91µs        ? ?/sec    1.00   1999.9±2.80µs        ? ?/sec
physical_plan_tpch_q19                                1.00   1660.1±3.12µs        ? ?/sec    1.01   1679.7±2.62µs        ? ?/sec
physical_plan_tpch_q2                                 1.00      4.0±0.00ms        ? ?/sec    1.02      4.1±0.01ms        ? ?/sec
physical_plan_tpch_q20                                1.00      2.2±0.00ms        ? ?/sec    1.02      2.2±0.00ms        ? ?/sec
physical_plan_tpch_q21                                1.00      3.0±0.00ms        ? ?/sec    1.02      3.1±0.00ms        ? ?/sec
physical_plan_tpch_q22                                1.02   1565.7±2.58µs        ? ?/sec    1.00   1535.1±2.79µs        ? ?/sec
physical_plan_tpch_q3                                 1.00   1889.2±2.62µs        ? ?/sec    1.03   1941.4±3.72µs        ? ?/sec
physical_plan_tpch_q4                                 1.00   1196.1±2.80µs        ? ?/sec    1.01   1205.6±2.12µs        ? ?/sec
physical_plan_tpch_q5                                 1.00      2.7±0.00ms        ? ?/sec    1.02      2.7±0.01ms        ? ?/sec
physical_plan_tpch_q6                                 1.00    633.2±1.61µs        ? ?/sec    1.01    638.7±1.38µs        ? ?/sec
physical_plan_tpch_q7                                 1.00      3.0±0.01ms        ? ?/sec    1.02      3.1±0.01ms        ? ?/sec
physical_plan_tpch_q8                                 1.00      4.0±0.02ms        ? ?/sec    1.01      4.0±0.01ms        ? ?/sec
physical_plan_tpch_q9                                 1.00      2.8±0.00ms        ? ?/sec    1.02      2.8±0.01ms        ? ?/sec
physical_select_aggregates_from_200                   1.00     15.4±0.03ms        ? ?/sec    1.01     15.6±0.03ms        ? ?/sec
physical_select_all_from_1000                         1.01    114.6±0.61ms        ? ?/sec    1.00    113.2±0.32ms        ? ?/sec
physical_select_one_from_700                          1.00    751.0±2.37µs        ? ?/sec    1.03    770.6±4.27µs        ? ?/sec
physical_sorted_union_order_by_10_int64               1.00      4.3±0.01ms        ? ?/sec    1.02      4.4±0.01ms        ? ?/sec
physical_sorted_union_order_by_10_uint64              1.00      8.4±0.01ms        ? ?/sec    1.04      8.7±0.02ms        ? ?/sec
physical_sorted_union_order_by_50_int64               1.00    107.5±0.29ms        ? ?/sec    1.01    108.9±0.32ms        ? ?/sec
physical_sorted_union_order_by_50_uint64              1.00    356.3±1.08ms        ? ?/sec    1.04    370.2±1.37ms        ? ?/sec
physical_theta_join_consider_sort                     1.00   1030.3±2.10µs        ? ?/sec    1.04   1076.3±6.49µs        ? ?/sec
physical_unnest_to_join                               1.00    648.1±1.95µs        ? ?/sec    1.01    656.9±6.24µs        ? ?/sec
physical_window_function_partition_by_12_on_values    1.00    721.8±2.66µs        ? ?/sec    1.01    729.8±1.22µs        ? ?/sec
physical_window_function_partition_by_30_on_values    1.00   1436.5±4.87µs        ? ?/sec    1.01   1455.3±2.80µs        ? ?/sec
physical_window_function_partition_by_4_on_values     1.00    438.0±0.94µs        ? ?/sec    1.02    448.7±1.08µs        ? ?/sec
physical_window_function_partition_by_7_on_values     1.00    540.9±2.17µs        ? ?/sec    1.02    553.3±1.78µs        ? ?/sec
physical_window_function_partition_by_8_on_values     1.00    580.8±1.21µs        ? ?/sec    1.02    590.4±1.19µs        ? ?/sec
with_param_values_many_columns                        1.00    429.5±2.26µs        ? ?/sec    1.01    434.0±1.94µs        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 1565.3s
Peak memory 19.9 GiB
Avg memory 19.8 GiB
CPU user 1882.4s
CPU sys 2.0s
Peak spill 0 B

branch

Metric Value
Wall time 1565.4s
Peak memory 19.9 GiB
Avg memory 19.9 GiB
CPU user 1883.5s
CPU sys 1.5s
Peak spill 0 B

File an issue against this benchmark runner

…sion

Per-node has_subquery_expressions walks were dominating optimizer cost
on subquery-free plans (the common case for join-heavy workloads).
Bench showed optimizer_join_chain_4 +12%, optimizer_tpch_all +3%, even
though logical_plan_tpch_all (full pipeline) was -5%.

Restore the one-shot whole-plan plan_has_subqueries check at the
driver, propagate has_subqueries: bool through rewrite_plan_in_place,
and dispatch to map_children_mut (no subquery descent) when false.
For the ApplyOrder::None path, also add a per-node fast-path inside
map_children_and_subqueries_mut to skip the mem::take + map_subqueries
roundtrip when the node carries no subquery expressions.

Make LogicalPlan::has_subquery_expressions pub so the optimizer can
reuse it without duplicating the walk.

Also fixes the cargo doc -D warnings CI break (unresolved intra-doc
link to TreeNode::rewrite).
@github-actions github-actions Bot added the logical-expr Logical plan and expressions label Jun 3, 2026
@zhuqi-lucas
Copy link
Copy Markdown
Contributor Author

run benchmark sql_planner

@adriangbot
Copy link
Copy Markdown

🤖 Criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4610453443-413-jhrwz 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing perf/in-place-mutable-rewrite (02748ae) to e71bd56 (merge-base) diff
BENCH_NAME=sql_planner
BENCH_COMMAND=cargo bench --features=parquet --bench sql_planner
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                                 main                                   perf_in-place-mutable-rewrite
-----                                                 ----                                   -----------------------------
logical_aggregate_with_join                           1.00    450.9±2.89µs        ? ?/sec    1.01    455.7±1.22µs        ? ?/sec
logical_correlated_subquery_exists                    1.00    282.0±0.84µs        ? ?/sec    1.01    284.5±0.90µs        ? ?/sec
logical_correlated_subquery_in                        1.00    283.9±2.37µs        ? ?/sec    1.00    284.0±0.87µs        ? ?/sec
logical_distinct_many_columns                         1.00    570.1±1.27µs        ? ?/sec    1.00    572.4±1.49µs        ? ?/sec
logical_join_4_with_agg_and_filter                    1.00    245.4±1.20µs        ? ?/sec    1.02    249.9±0.80µs        ? ?/sec
logical_join_8_with_agg_sort_limit                    1.00    413.0±1.76µs        ? ?/sec    1.01    419.2±1.23µs        ? ?/sec
logical_join_chain_16                                 1.00    666.9±2.59µs        ? ?/sec    1.02    677.1±2.57µs        ? ?/sec
logical_join_chain_4                                  1.00    122.0±0.65µs        ? ?/sec    1.00    121.6±0.46µs        ? ?/sec
logical_join_chain_8                                  1.00    248.5±0.86µs        ? ?/sec    1.01    250.1±0.73µs        ? ?/sec
logical_multiple_subqueries                           1.00    511.3±1.93µs        ? ?/sec    1.01    518.6±1.54µs        ? ?/sec
logical_nested_cte_4_levels                           1.00    266.5±1.14µs        ? ?/sec    1.00    265.9±1.08µs        ? ?/sec
logical_plan_struct_join_agg_sort                     1.00    178.6±0.85µs        ? ?/sec    1.00    178.1±0.64µs        ? ?/sec
logical_plan_tpcds_all                                1.00     91.9±0.15ms        ? ?/sec    1.01     92.7±0.18ms        ? ?/sec
logical_plan_tpch_all                                 1.00      6.4±0.02ms        ? ?/sec    1.01      6.4±0.02ms        ? ?/sec
logical_scalar_subquery                               1.00    307.1±0.93µs        ? ?/sec    1.01    310.6±1.17µs        ? ?/sec
logical_select_all_from_1000                          1.01    105.5±0.17ms        ? ?/sec    1.00    104.2±0.31ms        ? ?/sec
logical_select_one_from_700                           1.00    325.8±4.83µs        ? ?/sec    1.00    326.6±1.54µs        ? ?/sec
logical_trivial_join_high_numbered_columns            1.00    285.3±1.51µs        ? ?/sec    1.01    288.2±1.12µs        ? ?/sec
logical_trivial_join_low_numbered_columns             1.00    272.8±1.92µs        ? ?/sec    1.01    275.7±0.99µs        ? ?/sec
logical_union_4_branches                              1.00    419.5±2.10µs        ? ?/sec    1.01    425.7±1.97µs        ? ?/sec
logical_union_8_branches                              1.00    803.3±2.96µs        ? ?/sec    1.02    818.1±3.24µs        ? ?/sec
logical_wide_aggregate_100_exprs                      1.00      4.3±0.01ms        ? ?/sec    1.00      4.3±0.01ms        ? ?/sec
logical_wide_case_50_exprs                            1.00      2.3±0.01ms        ? ?/sec    1.01      2.4±0.00ms        ? ?/sec
logical_wide_filter_200_predicates                    1.00   1291.6±7.33µs        ? ?/sec    1.01   1310.2±6.63µs        ? ?/sec
logical_wide_filter_50_predicates                     1.00    386.6±2.25µs        ? ?/sec    1.01    388.8±2.35µs        ? ?/sec
optimizer_correlated_exists                           1.00    248.2±1.06µs        ? ?/sec    1.01    249.6±0.62µs        ? ?/sec
optimizer_join_4_with_agg_filter                      1.00    428.5±1.38µs        ? ?/sec    1.05    449.4±1.72µs        ? ?/sec
optimizer_join_chain_4                                1.00    176.1±0.39µs        ? ?/sec    1.03    180.6±0.43µs        ? ?/sec
optimizer_join_chain_8                                1.00    553.1±1.59µs        ? ?/sec    1.03    568.8±0.95µs        ? ?/sec
optimizer_select_all_from_1000                        1.00      4.7±0.02ms        ? ?/sec    1.00      4.7±0.01ms        ? ?/sec
optimizer_select_one_from_700                         1.00    250.0±0.39µs        ? ?/sec    1.04    260.6±0.78µs        ? ?/sec
optimizer_tpcds_all                                   1.00    291.5±0.48ms        ? ?/sec    1.00    291.9±0.27ms        ? ?/sec
optimizer_tpch_all                                    1.01     15.4±0.10ms        ? ?/sec    1.00     15.3±0.03ms        ? ?/sec
optimizer_wide_aggregate_100                          1.00      2.1±0.00ms        ? ?/sec    1.04      2.1±0.00ms        ? ?/sec
optimizer_wide_filter_200                             1.00      3.5±0.00ms        ? ?/sec    1.03      3.6±0.00ms        ? ?/sec
physical_intersection                                 1.00   583.8±12.35µs        ? ?/sec    1.03    598.7±2.41µs        ? ?/sec
physical_join_consider_sort                           1.00   1004.9±2.88µs        ? ?/sec    1.03   1032.8±3.61µs        ? ?/sec
physical_join_distinct                                1.00    267.4±1.43µs        ? ?/sec    1.00    268.2±1.03µs        ? ?/sec
physical_many_self_joins                              1.00      7.5±0.02ms        ? ?/sec    1.03      7.7±0.02ms        ? ?/sec
physical_plan_clickbench_all                          1.00    122.0±0.79ms        ? ?/sec    1.02    123.9±0.78ms        ? ?/sec
physical_plan_clickbench_q1                           1.00   1299.2±6.72µs        ? ?/sec    1.02   1328.4±6.63µs        ? ?/sec
physical_plan_clickbench_q10                          1.00   1910.7±7.47µs        ? ?/sec    1.03   1965.7±5.67µs        ? ?/sec
physical_plan_clickbench_q11                          1.00      2.1±0.01ms        ? ?/sec    1.03      2.1±0.01ms        ? ?/sec
physical_plan_clickbench_q12                          1.00      2.2±0.01ms        ? ?/sec    1.01      2.2±0.01ms        ? ?/sec
physical_plan_clickbench_q13                          1.00   1924.7±6.66µs        ? ?/sec    1.01   1943.2±5.91µs        ? ?/sec
physical_plan_clickbench_q14                          1.00      2.1±0.01ms        ? ?/sec    1.01      2.1±0.01ms        ? ?/sec
physical_plan_clickbench_q15                          1.00   1979.3±6.28µs        ? ?/sec    1.01   2000.0±6.98µs        ? ?/sec
physical_plan_clickbench_q16                          1.00   1677.5±5.69µs        ? ?/sec    1.01   1686.2±7.17µs        ? ?/sec
physical_plan_clickbench_q17                          1.00   1727.4±5.67µs        ? ?/sec    1.00   1730.9±9.18µs        ? ?/sec
physical_plan_clickbench_q18                          1.00   1590.5±5.59µs        ? ?/sec    1.02   1619.9±6.24µs        ? ?/sec
physical_plan_clickbench_q19                          1.00   1944.8±9.59µs        ? ?/sec    1.02   1980.3±6.97µs        ? ?/sec
physical_plan_clickbench_q2                           1.00   1708.4±6.83µs        ? ?/sec    1.01   1722.4±9.04µs        ? ?/sec
physical_plan_clickbench_q20                          1.00   1485.4±7.09µs        ? ?/sec    1.02   1515.0±7.34µs        ? ?/sec
physical_plan_clickbench_q21                          1.01   1707.0±7.19µs        ? ?/sec    1.00   1696.5±8.64µs        ? ?/sec
physical_plan_clickbench_q22                          1.00      2.0±0.01ms        ? ?/sec    1.05      2.1±0.01ms        ? ?/sec
physical_plan_clickbench_q23                          1.00      2.2±0.01ms        ? ?/sec    1.04      2.3±0.01ms        ? ?/sec
physical_plan_clickbench_q24                          1.00      6.7±0.02ms        ? ?/sec    1.00      6.6±0.02ms        ? ?/sec
physical_plan_clickbench_q25                          1.00   1805.6±5.91µs        ? ?/sec    1.04   1874.3±7.43µs        ? ?/sec
physical_plan_clickbench_q26                          1.00   1644.0±5.64µs        ? ?/sec    1.03   1698.3±7.34µs        ? ?/sec
physical_plan_clickbench_q27                          1.00   1822.4±4.90µs        ? ?/sec    1.04   1892.1±6.92µs        ? ?/sec
physical_plan_clickbench_q28                          1.00      2.2±0.01ms        ? ?/sec    1.02      2.3±0.01ms        ? ?/sec
physical_plan_clickbench_q29                          1.00      2.4±0.01ms        ? ?/sec    1.03      2.5±0.01ms        ? ?/sec
physical_plan_clickbench_q3                           1.00   1576.8±5.49µs        ? ?/sec    1.02   1607.0±6.42µs        ? ?/sec
physical_plan_clickbench_q30                          1.00     15.1±0.09ms        ? ?/sec    1.01     15.2±0.07ms        ? ?/sec
physical_plan_clickbench_q31                          1.00      2.3±0.01ms        ? ?/sec    1.02      2.3±0.01ms        ? ?/sec
physical_plan_clickbench_q32                          1.00      2.3±0.01ms        ? ?/sec    1.02      2.3±0.01ms        ? ?/sec
physical_plan_clickbench_q33                          1.00   1878.1±6.34µs        ? ?/sec    1.03   1931.7±6.35µs        ? ?/sec
physical_plan_clickbench_q34                          1.00   1674.9±5.76µs        ? ?/sec    1.02   1705.1±6.15µs        ? ?/sec
physical_plan_clickbench_q35                          1.00   1736.7±4.98µs        ? ?/sec    1.03  1783.7±25.16µs        ? ?/sec
physical_plan_clickbench_q36                          1.00      2.0±0.01ms        ? ?/sec    1.03      2.1±0.01ms        ? ?/sec
physical_plan_clickbench_q37                          1.00      2.4±0.01ms        ? ?/sec    1.03      2.4±0.01ms        ? ?/sec
physical_plan_clickbench_q38                          1.00      2.4±0.01ms        ? ?/sec    1.03      2.4±0.01ms        ? ?/sec
physical_plan_clickbench_q39                          1.00      2.4±0.01ms        ? ?/sec    1.03      2.5±0.01ms        ? ?/sec
physical_plan_clickbench_q4                           1.00   1401.7±6.32µs        ? ?/sec    1.02   1430.7±5.05µs        ? ?/sec
physical_plan_clickbench_q40                          1.00      3.2±0.01ms        ? ?/sec    1.03      3.3±0.01ms        ? ?/sec
physical_plan_clickbench_q41                          1.00      2.7±0.01ms        ? ?/sec    1.04      2.8±0.01ms        ? ?/sec
physical_plan_clickbench_q42                          1.00      2.9±0.01ms        ? ?/sec    1.04      3.0±0.01ms        ? ?/sec
physical_plan_clickbench_q43                          1.00      3.0±0.01ms        ? ?/sec    1.04      3.1±0.02ms        ? ?/sec
physical_plan_clickbench_q44                          1.00   1457.6±5.59µs        ? ?/sec    1.04   1514.5±4.72µs        ? ?/sec
physical_plan_clickbench_q45                          1.00   1472.0±6.64µs        ? ?/sec    1.03   1518.4±7.31µs        ? ?/sec
physical_plan_clickbench_q46                          1.00   1760.4±7.24µs        ? ?/sec    1.04   1823.5±8.35µs        ? ?/sec
physical_plan_clickbench_q47                          1.00      2.5±0.01ms        ? ?/sec    1.03      2.6±0.01ms        ? ?/sec
physical_plan_clickbench_q48                          1.00      2.7±0.01ms        ? ?/sec    1.03      2.7±0.01ms        ? ?/sec
physical_plan_clickbench_q49                          1.00      2.7±0.01ms        ? ?/sec    1.03      2.8±0.01ms        ? ?/sec
physical_plan_clickbench_q5                           1.00   1503.5±4.39µs        ? ?/sec    1.04   1562.5±6.52µs        ? ?/sec
physical_plan_clickbench_q50                          1.00      2.6±0.01ms        ? ?/sec    1.04      2.6±0.02ms        ? ?/sec
physical_plan_clickbench_q51                          1.00   1838.4±6.08µs        ? ?/sec    1.04  1908.1±28.90µs        ? ?/sec
physical_plan_clickbench_q6                           1.00   1507.7±4.46µs        ? ?/sec    1.04   1569.5±7.18µs        ? ?/sec
physical_plan_clickbench_q7                           1.00   1341.6±5.12µs        ? ?/sec    1.04   1394.6±5.45µs        ? ?/sec
physical_plan_clickbench_q8                           1.00   1830.2±5.75µs        ? ?/sec    1.03   1883.9±6.59µs        ? ?/sec
physical_plan_clickbench_q9                           1.00  1818.2±16.70µs        ? ?/sec    1.02   1854.4±6.73µs        ? ?/sec
physical_plan_struct_join_agg_sort                    1.00   1274.5±2.52µs        ? ?/sec    1.00   1278.8±3.28µs        ? ?/sec
physical_plan_tpcds_all                               1.00    726.8±4.03ms        ? ?/sec    1.02    737.8±4.02ms        ? ?/sec
physical_plan_tpch_all                                1.00     45.1±0.07ms        ? ?/sec    1.01     45.7±0.09ms        ? ?/sec
physical_plan_tpch_q1                                 1.00   1472.4±2.72µs        ? ?/sec    1.02   1499.3±2.69µs        ? ?/sec
physical_plan_tpch_q10                                1.00      2.9±0.01ms        ? ?/sec    1.03      3.0±0.01ms        ? ?/sec
physical_plan_tpch_q11                                1.01      2.2±0.00ms        ? ?/sec    1.00      2.1±0.01ms        ? ?/sec
physical_plan_tpch_q12                                1.00   1198.5±8.18µs        ? ?/sec    1.00   1204.2±4.56µs        ? ?/sec
physical_plan_tpch_q13                                1.00    998.4±3.46µs        ? ?/sec    1.02   1018.6±2.49µs        ? ?/sec
physical_plan_tpch_q14                                1.00   1379.7±2.45µs        ? ?/sec    1.01   1400.3±4.17µs        ? ?/sec
physical_plan_tpch_q16                                1.00   1529.9±4.71µs        ? ?/sec    1.00   1536.1±4.88µs        ? ?/sec
physical_plan_tpch_q17                                1.00   1685.1±3.19µs        ? ?/sec    1.01   1700.9±3.08µs        ? ?/sec
physical_plan_tpch_q18                                1.00   1966.2±2.54µs        ? ?/sec    1.01   1977.0±2.84µs        ? ?/sec
physical_plan_tpch_q19                                1.00   1637.1±3.65µs        ? ?/sec    1.02   1671.7±3.91µs        ? ?/sec
physical_plan_tpch_q2                                 1.00      4.1±0.00ms        ? ?/sec    1.01      4.1±0.01ms        ? ?/sec
physical_plan_tpch_q20                                1.00      2.2±0.00ms        ? ?/sec    1.01      2.2±0.00ms        ? ?/sec
physical_plan_tpch_q21                                1.00      3.0±0.00ms        ? ?/sec    1.01      3.1±0.00ms        ? ?/sec
physical_plan_tpch_q22                                1.03   1544.2±2.76µs        ? ?/sec    1.00   1501.6±2.44µs        ? ?/sec
physical_plan_tpch_q3                                 1.00   1901.0±4.25µs        ? ?/sec    1.03  1958.0±12.74µs        ? ?/sec
physical_plan_tpch_q4                                 1.00   1199.4±2.58µs        ? ?/sec    1.01   1213.0±2.89µs        ? ?/sec
physical_plan_tpch_q5                                 1.00      2.7±0.00ms        ? ?/sec    1.03      2.7±0.01ms        ? ?/sec
physical_plan_tpch_q6                                 1.00    628.9±1.78µs        ? ?/sec    1.02    643.6±2.68µs        ? ?/sec
physical_plan_tpch_q7                                 1.00      3.0±0.01ms        ? ?/sec    1.03      3.1±0.00ms        ? ?/sec
physical_plan_tpch_q8                                 1.00      4.0±0.01ms        ? ?/sec    1.03      4.1±0.02ms        ? ?/sec
physical_plan_tpch_q9                                 1.00      2.8±0.01ms        ? ?/sec    1.03      2.8±0.01ms        ? ?/sec
physical_select_aggregates_from_200                   1.00     15.5±0.04ms        ? ?/sec    1.00     15.5±0.04ms        ? ?/sec
physical_select_all_from_1000                         1.01    114.8±0.23ms        ? ?/sec    1.00    113.2±0.31ms        ? ?/sec
physical_select_one_from_700                          1.00    754.3±2.54µs        ? ?/sec    1.03    778.8±2.47µs        ? ?/sec
physical_sorted_union_order_by_10_int64               1.00      4.3±0.00ms        ? ?/sec    1.01      4.4±0.01ms        ? ?/sec
physical_sorted_union_order_by_10_uint64              1.00      8.4±0.03ms        ? ?/sec    1.03      8.7±0.02ms        ? ?/sec
physical_sorted_union_order_by_50_int64               1.00    107.8±0.25ms        ? ?/sec    1.01    108.5±0.29ms        ? ?/sec
physical_sorted_union_order_by_50_uint64              1.00    361.4±2.27ms        ? ?/sec    1.03    372.0±1.56ms        ? ?/sec
physical_theta_join_consider_sort                     1.00   1036.2±3.48µs        ? ?/sec    1.02   1055.2±2.68µs        ? ?/sec
physical_unnest_to_join                               1.00    655.1±2.47µs        ? ?/sec    1.01    659.5±2.89µs        ? ?/sec
physical_window_function_partition_by_12_on_values    1.00    713.8±1.87µs        ? ?/sec    1.01    722.5±1.32µs        ? ?/sec
physical_window_function_partition_by_30_on_values    1.00   1434.1±4.12µs        ? ?/sec    1.01   1446.7±3.12µs        ? ?/sec
physical_window_function_partition_by_4_on_values     1.00    421.6±1.51µs        ? ?/sec    1.04    437.1±1.34µs        ? ?/sec
physical_window_function_partition_by_7_on_values     1.00    531.7±4.51µs        ? ?/sec    1.03    546.7±1.80µs        ? ?/sec
physical_window_function_partition_by_8_on_values     1.00    574.8±3.13µs        ? ?/sec    1.02    585.6±2.07µs        ? ?/sec
with_param_values_many_columns                        1.00    430.7±2.22µs        ? ?/sec    1.00    431.6±2.44µs        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 1570.3s
Peak memory 19.9 GiB
Avg memory 19.8 GiB
CPU user 1886.2s
CPU sys 2.1s
Peak spill 0 B

branch

Metric Value
Wall time 1575.3s
Peak memory 19.9 GiB
Avg memory 19.9 GiB
CPU user 1892.8s
CPU sys 1.6s
Peak spill 0 B

File an issue against this benchmark runner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

logical-expr Logical plan and expressions optimizer Optimizer rules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf(optimizer): generic in-place mutable rewrite to obsolete per-rule fast-paths

2 participants